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The distribution of the population of cities has attracted a great deal of attention, in 
part because it sharply constrains models of local growth. However, to this day, there is no 
consensus on the distribution below the very upper tail, because available data need to rely 
on the "legal" rather than "economic" definition of cities for medium and small cities. To 
remedy this difficulty, in this work we construct cities "from the bottom up" by clustering 
populated areas obtained from high-resolution data. This method allows us to investigate 
the population and area of cities for urban agglomerations of all sizes. We find that Zipf's 
law (a power law with exponent close to 1) for population holds for cities as small as 12,000 
inhabitants in the USA and 5,000 inhabitants in Great Britain. In addition the distribution 
of city areas is also close to a Zipf's law. We provide a parsimonious model with endogenous 
city area that is consistent with those findings. (JEL D30, D51, J61, R12) 
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1 Introduction 



This paper builds on a recently-propose d algorithm to constr uct cities based on geographi- 
cal features of high-quality micro data (IRozenfeld et al.ll2008l ). rather than informative but 
somewhat arbitrary legal or administrative definitions. It allows us to take a fresh look 
at key quantities in urban economics, namely the population and the area of cities. We 
find that Zipf's law for population holds quite well, and well below the very upper tail of 
the c ity size distribution, where i t had been shown to hold to a good degree of approxima- 



tion (IGabaix and Ioannidesll2004f ). We also find that the distribution of city areas follows a 
power law, with an exponent close to 1, the Zipf value. These findings help constrain further 
theories of cities and theories of geography. We present a baseline parsimonious model of 
cities, which features endogenous city area, and is consistent with these two key stylized 
facts, as well as others. 



A key difficulty in studying cities is finding a practical way to define them (Zipf 



Krugman 



2005 



1996 



Eaton and Eckstein 



1997 



Dobkins and Ioannidee 



2001: 



Eeckhout 



1949 



2004 



Soo 



Battyl 120061 ) . A canonical method involves defini ng Metropolitan Statistica l Areas 



(MSAs) obtained in the USA from the US Census Bureau (jU.S. Census Bureaull2009l ) . MS As 
are defined for each major agglomeration, and attempt to capture their extent by merging 
administratively defined entities, counties in the USA, based on their social or economic 
ties. For instance, the MSA of Boston includes not only the administrative unit of Boston, 
but also adjacent Cambridge, MA. MSAs derive their appeal from a strong economic logic, 
but their construction requires qualitative analysis and is very time-consuming. There- 
fore, MSAs have been constructed only for the 276 most populated cities in the USA, and 
the c orresponding Zipf's law has been docu mented only for the upper tail of the distribu- 
tion ( Gabaix and Ioannides 2004 ; Isoo 2005 ). 

Two main alternatives to the MSAs have been proposed in the literature. One method 
is to use administr ative or leg a l bor ders of cities to define the so-called "places" as done 

25,35 9 places in the USA has sug- 



by lEeckhoutl (120041 ) and iLevvi kood \. The anal ysis o 



gested that Zipf's law holds in the upper tail (jLevyj 120091 ) but fails in the bulk of the 
distr ibution, as lega l ly defi ned cities follow a log-normal distribution rather than a power- 
law (lEeckhoutl I2004J . 120091 ) . The advantage of this definition is that it allows the study of 
the distribution of cities of all sizes. Still, it is problematic to define cities through their 
fairly arbitrary legal boundaries (the places method treats Cambridge and Boston as two 
separate units), and indeed, this is why researchers prefer agglomerations such as MSAs 



data ( 
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2008|). In particular, ] 
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) consider cities to 



be individual cells of six-by-six miles, for which the tail of the city size distribution is much 
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less fat-tailed than Zipf 's law. However, this is probably because constraining cities to areas 
of six-by-six miles makes it nearly impossible to find a very large city. Hence, because of 
these methodological difficulties, the shape of distribution of agglomerations beneath the few 
hundred largest cities is still an open problem. 

Here we b uild on an algorithm, t he City Clustering Algorithm (CCA), that was recently 



introduced in ( Rozenfeld et al. 



20081 ) and based on previous studies done by lMakse. Havlin and Stanley 



( 1995b to build cities "from the bottom-up". The algorithm defines a "city" as a maximally 
connected cluster of populated sites defined at high resolution. Namely, a population cluster 
is made of contiguous populated sites within a prescribed distance t that cannot be ex- 
panded: all sites immediately outside the cluster have a popu lation density be l ow a cutoff 
threshold. Rather than defining a city as one cell, as done by iHolmes and Lee ( 2009 ). our 
method defines an agglomeration as a maximally connected cluster of potentially many cells. 

We find that Zipf's law holds, to a good approximation, in the USA and GB, for both 
populations and areas. We also find that density has only a weak correlation with population 
and area. We propose that the two facts of Zipf's law for populations and areas could serve 
as tight constraints on models of cities. As we can measure area, we wish to model it. Hence, 
we provide a parsimonious urban model that incorporates areas, and generates Zipf's law 
for areas and populations. 

In Section [2] we present the analyzed data and explain the CCA. In Section [3] we present 
our results for the population distribution of CCA clusters in the USA and GB. We also 
compare the CCA clusters with US Census MSAs and places and present a formal test of 
robustness of our clustering method. In Section H] we show the results of the area distribution 
of CCA clusters in the USA and GB and present a study of the correlations between densities, 
areas, and populations for CCA clusters. In Section Owe propose a model that can integrate 
the findings, and we summarize our conclusions in Section [6j 



2 Data and Methods 



2.1 Raw data 

The data for the USA consists of the location and popul ation of 61,224 points located 
throughout the area of the USA (jU.S. Census Bureau! 120011 ). Ea ch point corresponds to a 



Feder al Information Processing Standard (FIPS) census tract code ([National Institute of Standards and Te 

20081 ) generated by the US Census Bureau ranging in population from 1,500 to 8,000 people, 

with a typical size of about 4,000 people. FIPS codes are uniquely specified by 11 digits. 

The first 2 digits correspond to the state code, the next 3 to the county within the state, and 

the next 6 correspond to the census tract code. For example FIPS 36061016500 corresponds 

to New York State (36), New York County (Manhattan, 061), census tract 016500 which is 

an area ranging from 58th Street to 60th Street and from 8th Ave. to 9th Ave. Figure [T] 
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shows all FIPS for Manhattan Island in New York City and its surroundings. The location 
of the FIPS is not always equidistant. For instance the shortest distance between two FIPS 
is about 100 m as in appears in Manhattan, while in less populated areas like Wyoming, 
FIPS can be separated by about 100 km. 

CCNY 



Hudson 
River 



NYU- 




Harlem 



Central Park 



East River 



Figure 1: Raw data for Manhattan. In this plot we show all FIPS codes corresponding 
to Manhattan Island obtained from the raw data for the USA. Each point corresponds to a 
FIPS code specified by the US Census Bureau. 

The data for Great Britain (GB) is uniformly gridded at high resolution. It consists of a 

grid with cell size 200 m overlaid on the area of GB for which the population in each cell is 

given. The source of the GB data is the ESRC (IThe 1981 and 1991 population census. Crown Copyright. E 
20091 ) and is composed of 5.75 million square cells comprising a total population of about 55 
million inhabitants in 1991. Given that the GB data is more fine-grained that of the US, it 
is arguably higher- quality. All datasets and results used and presented in this work may be 
downloaded from our web page. 



2.2 The City Clustering Algorithm (CCA) 



We start this section by providing a detailed explanation of the CCA (IRozenfeld et al.ll2008l ). 
In Fig. we show four steps of the CCA when it is applied to the USA. To define a CCA 
cluster, we first locate a populated site. Then, we recursively grow the cluster by adding 
all nearest-neighbor sites (populated sites within a distance smaller than the coarse-graining 
level, £, from any site within the cluster) with a population density, D, larger than a threshold 
D*. The cluster stops growing when no site outside the cluster with population density 
D > is at a distance smaller than i from the cluster boundary. In this work, to minimize 
the number of free parameters, we set the threshold D* = 0, and therefore clusters are 
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recursively grown by merging all populated sites within a distance smaller than t from any 
site within the cluster. 




a b 



Figure 2: a, CCA applied to the USA (continuum CCA). The points in this figure denote 
populated sites or FIPS. For our studies (and in this diagram) we use a density threshold 
.D* = 0. (i) We start a cluster selecting a populated site, red point, among all available 
populated sites. We draw a circle of radius I and add all populated sites, blue point, that 
fall within the circle, (ii) We draw a circle from the new member of this cluster and add all 
populated sites (denoted by the two blue points) within the circle, (iii) Recursively, we keep 
drawing circles from all new cluster sites. The populated sites inside the circles (three blue 
points in this case) are merged into the cluster, (iv) The red points are the members of the 
cluster. Since no black point is at a distance smaller than i from any red point, the cluster 
does not grow anymore. We start the process again selecting another initial point that has 
not been already assigned to any cluster. This process is repeated until all populated sites 
are assigned to a cluster. Notice that the choice of the initial condition, the first selected 
point, does not influence the outcome, b, CCA applied to GB (discrete CCA), (i) Cells are 
colored in blue if they are populated, otherwise they are in white, (ii) We initialize the CCA 
by selecting a random populated cell (red cell). Then, we merge all populated neighbors of 
the red cell as shown in (iii). We keep growing the cluster by iteratively merging neighbors 
of the red cells until all neighboring cells are unpopulated, as shown in (iv). Next, we 
pick another unburned populated cell and repeat the algorithm until all populated cells are 
assigned to a cluster. 

Once the clusters are built, we calculate the population of a cluster as the sum of the 
populations of all sites within the cluster. Figure shows a map of all identified clusters in 
the continental USA where colors correspond to the cluster population, and Fig. |3]d shows a 
detail of the clusters in the northeastern USA for different £. 

Since the data of GB is already gridded, the CCA algorithm adopts a simpler form that in 
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the USA. To apply the CCA to the GB data we start from a populated cell and at each step 
we grow the cluster by adding all populated cells neighboring the boundary of the cluster 
(see Fig [2b). The cluster stops growing when all cells neighboring the cluster have a density 
no greater than D*. 

As mentioned before, the data for GB differs from that of the USA, consisting of a grid 
with cell size 200 m overlaid on the map of GB. For this reason, since the data is already 
gridded at a very high resolution, we simply merge cells from the original data to obtain 
larger levels of coarse-graining at different grid sizes t. We call this version of the CCA, the 
discrete CCA, while the version applied to the USA is the continuum CCA (see Fig. [2]). 



3 Population Distribution 

3.1 Basic Results 

We analyze the population data in the USA and GB to obtain the distribution, P(S), 
measuring the probability density that a cluster has a population between S and S + dS. 
Figure 0] shows the results of P(S) for the USA for £ = 2 km, £ = 3 km, and i = 4 km 
for which we obtain 30,201, 23,499, and 19,912 clusters, respectively. We find that the 
population distribution follows a power-law of the form: 

P(S)~S-<-\ (1) 

with an exponent of £ « 1, in approximate accordance with the value of Zipf's law. For 
example, when we estimate the exponent for £ = 3 km and for clusters with S > S* = 12, 000 
inhabitants (comprising 63% of the country's population) we find ( = 0.97 ± 0.03 using an 
OLS estimator (the notation ± means that the standard deviation is 0.03). Figure |5] shows 
the Zipf exponent £ for the USA for several value of £. We observe that the exponent ( 
remains approximately within 5% of the Zipf value in the range £ G [2.5, 3.5] km. 

Figure |6] displays the population distribution of the CCA clusters in GB for £ = 0.2 km, 
£ = 0.6 km, £ = 1 km, £ = 1.8 km, £ = 2 km, and £ = 2.6 km. For clusters with a population 
above a cutoff 5* = 5, 000 inhabitants, the GB population follows a power-law to a good 
degree of approximation. Using an OLS regression, we estimate for £ = 1 km (1,008 clusters 
with 83% of the country's population) a Zipf exponent ( = 1.07 ± 0.03. As in the case of 
the USA, the exponent is similar for different choices of grid size £. 



To for mally study the validity of our power-law fits, we employ the test proposed by lGabaix and Ibraginn 



( 120101 ) and iGababd (120091 ). offering a simple quantification of possible deviations from a pure 
power-law. The test for quadratic deviations is used to determine if a power-law is adequate 
to describe the city size distribution. The method is as follows. Sort the cities according to 
their rank i (i — 1 being the largest city) and run the OLS regression 

ln(z - 1/2) = constant - ( ln^ + q (InS, - 7) 2 (2) 
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Figure 3: CCA clusters in the USA. a, CCA clusters applied to the entire USA. The map 
shows the different clusters obtained by the algorithm. The color indicates the population of 
each urban cluster (in logarithmic scale), b, Results of the CCA applied to the major clusters 
of the northeastern USA at different length scales. The top left panel shows the CCA clusters 
for i = 1 km separating the cities of Washington D.C, Baltimore, Philadelphia, Newark, 
Jersey City, New York, and Boston. The top right panel shows the results of the algorithm 
when the data is coarse-grained to £ = 2 km. Here, for example, the cities of New York, 
Newark and Jersey City become part of the same cluster. The lower left panel shows the 
results for I = 4 km, where the main clusters are Washington D.C. -Baltimore; Philadelphia; 
New York-Newark- Jersey City-Long Island; and Boston- Cambridge. The lower right panel 
for I = 8 km shows a giant cluster comprising all major cities in the northeastern USA. The 
gray points are also identified as part of other clusters but for clarity we do not specify them 
with individual colors in this figure. 
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Figure 4: Probability distribution of cluster populations P(S) for the USA at different coarse- 
graining scales I. The black solid line denotes a power-law function with exponent -2, i.e. 
Zipf's Law. 




Figure 5: Zipf exponent £ obtained for the USA clusters at different 
correspond to ±1 standard deviation. 



The error bars 
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Figure 6: Probability distribution P(S) for GB clusters at different coarse-graining scales £. 



where ( (the power law exponent) and q (the quadratic deviation from a power law) are 
the parameters to estimate and 7 = (cov((lnSj) 2 , InSj)) / (2var(lnSj)). The recentering term 
7 ensures that the exponent ( is the same whether the quadratic term is included or not, 
and therefore ( may be estimated beforehand using a simple linear OLS. The quadratic 
test formalizes the intuition that a pure power law has q = in the asymptotic limit, so a 
high value of \q\ indicates deviations from power-law behavior. Under the null of a power 
law, for large samples y/2Nq^/( 2 converges to a standard normal distribution (where N is 
the number of data-points). With probability 0.99, a standard normal is less than 2.57 in 
absolute value. Hence, let q c = 2.57£ 2 /v / 2iV, be the critical value for the absolute value 
of the quadratic term q at the 1% confidence level. If \q\ > q c we reject the hypothesis 
that the data is well described by a power-law since the quadratic term becomes significant. 
Otherwise, if \q\ < q c , the quadratic term is insignificant and we do not reject the power-law 
hypothesis. 

For the USA, when we consider the distribution of city sizes for cities larger than S 1 * = 
12, 000 for £ = 3 km, we obtain \q\ = 0.0291 and q c = 0.0413. Since |g| < q c , we conclude that 
we can disregard the quadratic correction to the OLS fit and consider that the power-law 
describes the empirical distribution of city sizes. In the case of GB, we consider S* = 5, 000 
and £ = 1 km, for which \q\ = 0.0521 and q c = 0.0522. Although \q\ and q c are very close, the 
fact that \q\ < q c indicates that we cannot reject the hypothesis that the power-law describes 
the city size distribution for GB. We conclude that Zipf's law is a good description of city 
sizes with population above = 12, 000 inhabitants in the USA and S* = 5, 000 inhabitants 
in GB. This comprises 1, 947 clusters (for £ = 3 km) and a population of 171.3 million out 
of a total population of 271.1 million in the USA, and 1,007 clusters (for £ = 1 km) and a 
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population of 45.3 mi 



previous samples (jSoo 



lion out of a total population of 54.5 million in GB, in contrast to 



20051 ) typically having a few hundred cities. 



So far, we have only focused on the part of the distribution where a power law fit could 
not be statistically rejected. Now, somewhat more loosely, we turn to a visual inspection of 
Fig. H] and Fig. El We see that the distribution is arguably well-approximated by a power 
law, in a region covering cities above 3,000 inhabitants in the USA, and cities above 300 
inhabitants in GB. The deviations from the power law, while statistically significant, are not 
very large economically. Hence, we also submit that, for the modelling a cities, the domain 
of an approximate power law is quite large. This domain comprises 17609 clusters and a 
population of 259.3 million (96% of the total population) in the USA, and 9214 clusters and 
a population of 53.1 million (96% of the total population) in GB. 



3.2 Comparison between CCA clusters, MSAs, and Places 

Although the CCA allows one to choose the observation level of population clusters, £, it 
may be desirable to have an objective way to choose £. For this purpose, we perform a 
comparison with the MSAs in the USA which may be considered a benchmark for plausibly 
well-constructed cities. MSAs are defined starting from a highly populated central county 
with population larger than 50,000 and adding its surrounding counties if they have social 
or economic ties such as large commuting patterns between the regions. Figures [T^l and [7b 
show a comparison between the MSAs of the northeastern USA and the clusters obtained 
using CCA. 

In order to find the value of £ that best matches the MSAs we match each MSA with the 
most populated overlapping CCA cluster. For this purpose, from the US Census Bureau, we 
obtain the counties (and corresponding FIPS) that belong to each MSA. An overlap between 
an MSA and a CCA cluster exists if they share at least one FIPS code. This overlapping 
procedure leads to several CCA clusters corresponding to one particular MSA. To obtain a 
one-to-one correspondence, among all overlapping CCA clusters we select the one with the 
largest population. We compare the size of the obtained CCA cluster with the corresponding 
MSA by computing the correlation, p(£), between the logarithm of the cluster population, 
Sf CA (£), and the logarithm of the population of the MSA, S' J MSA . Figure [8^ shows the cross- 
plot of log Sf lsA versus log S^ CA (£) for £ = 3 km displaying an approximately linear behavior. 
Figure Eb shows the correlation analysis between CCA clusters and MSAs by plotting p(£) 
for other values of £. We quantify the regression, \ogSf CA (£) = a(£) + b(£) log Sf 1 ^ (£), 
by measuring the value of the linear regression slope b(£) as a function of £. We find that 
b(£) wl,2 for £ > 2 km. Correlation in log sizes is very good for values of £ between 2 km 
and 6 km; the correlation, displayed in Fig. [Sb, is very high for this range of £. We find that 
p(£) exhibits a maximum value of p ~ 0.91 for £ e [2.5, 3.5] km, so that we consider £ = 3 
km as the optimal value. 
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Figure 7: Comparison between the MSAs and the CCA clusters, a, MSAs for the 
northeastern USA. For example, New York county (Manhattan) with a population larger 
than 50,000 is a center of a MSA. Jersey City belongs to the same MSA since a large number 
of its population commute to Manhattan, setting economic and social ties between the two 
regions, b, CCA clusters for the northeastern USA for I = 5 km. Each cluster or MSA is 
plotted with a different color. For instance, the MSA centered in New York City (in green in 
a) is composed of several clusters. The largest overlapping cluster found with the CCA is in 
green in b. The white concentric circles correspond to the location of the state capitals in the 
considered region. The star denotes Washington D.C and the white full circle corresponds 
to New York City. 
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Figure 8: a, Population of the CCA cluster in the USA for £ = 3km vs its corresponding 
MSAs, using the one-to-one correspondence explained in the text, b, Correlation analysis 
between CCA clusters and MSAs by plotting p{£) for different values of I. We quantify 
the regression, In 5^ {£) = a(£) + b(£) In Sf 1 (£), by measuring the value of the linear 
regression slope b(£) as a function of £. c, Euclidean distance between MSAs and CCA 
clusters. 
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We present another plausible measure of similarity between MSAs and CCA clusters, 
based on the Euclidean distance. We define the distance, d(£), between MSAs and CCA as 



(3) 



where the sum is over all the MSAs and their corresponding CCA clusters. In Fig. [St we 
show the distance between overlapped MSAs and CCA clusters as a function of £. We find 
that when I = 5 km the distance (in population) is minimized, and that it is very low 
between 2.5 km < i < 6 km in approximate agreement with the log correlation analysis of 
Fig. [8ji,b. 

In addition to the MSAs, we c o mpare the CCA clusters with US Census Bureau "places" 
previously analyzed in lEeckhout ( 2004 ) where a log-normal distribution of city sizes was 
found. We first find a one-to-one correspondence between CCA clusters and places, in 
analogy to the previous match between MSAs and CCA clusters. In contrast to MSAs, US 
Census places take into account all to wns, villages, and cities and are based only on their 



administrative or political boundaries ( IEeckhoutll2004l ; iHolmes and Ledl2009l ). The smallest 



and largest places are Lost Spring, Wyoming, with exactly one resident, and the political 
entity of New York City (Manhattan, Brooklyn, Queens, Bronx, and Staten Island) with 
population 8.0 million. 

From the US Census Bureau we obtain the geographical location of each US Census 
place. Then, we identify each place with a unique FIPS code. Accordingly, each place is 
associated with a unique CCA cluster. This association leads to many places corresponding 
to a single CCA. To obtain the one-to-one correspondence, among all overlapping places we 
consider the one with the largest population. 

In Fig. [Hk we show that, the smallest cities found with the CCA do not correspond well 
to US Census places; however, for cities above population S = 10, 000 CCA and Census 
places do exhibit a correlation coefficient of p = 0.79. A detailed comparison between CCA 
clusters and places shows that the number of small CCA clusters is smaller than that for 
places because the CCA tends to group small places that are geographically connected into a 
larger cluster. Therefore, the construction based on places overestimates the number of small 
cities and underestimates the number of large cities in comparison with CCA, resulting in 
the size distribution of places to being less fat tailed than the distribution for CCA clusters. 
This discrepancy, which may find its root in th e fact that places are purely based on legal 
boundaries of location s (Holmes and Leel 2009 ). may explain the finding of a log- normal 
distribution of places ( Eeckhou"tl " 2004 ). whose full elucidation is beyond the scope of this 
paper. Here, we show results for i = 3 km as representative, but other values of i lead to 
the same conclusions. 

We also perform a comparison between MSAs and places. In Fig. [9b we observe a 
good congruence in the whole range for which MSAs are defined. Notice that MSAs by 
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Figure 9: a, Log of population of US Census places vs. the log of population of their 
corresponding CCA clusters for i = 3km. The straight line corresponds to a least square 
fit with slope b = 0.90 ± 0.02 and y-intercept a = 0.25 ± 0.09, from where the correlation 
coefficient p = 0.79 is obtained for cities with population larger than 10,000. b, Log of 
population of US Census places vs. the log of population of their corresponding MSA. The 
straight line corresponds to a least square fit with slope b = 0.84 ± 0.03 and y-intercept 
a = 0.37 ± 0.14, from where the correlation coefficient p = 0.87 is obtained. 

definition have a minimum population of 50,000. Therefore, when looking for the one-to-one 
correspondence, only large places are considered, leading to a good congruence, as found 
between large CCA cluster and large places, with correlation p = 0.87. 

3.3 Robustness Checks 

In this section we test whether the results shown in Section 13.11 could be forced by the 
CCA, or in other words, whether they could be an artifact of the CCA. Starting with the 
actual location of the FIPS in the USA we randomize the data by placing all 61,224 FIPS 
at random positions in a rectangle of the same area as the USA. Then we apply the CCA to 
obtain the corresponding clusters. This randomization procedure preserves the population 
of each FIPS. In Fig. [10] we show the population distribution for the shuffled data and for 
the original data. These results show that the shuffled data does not exhibit Zipf 's law. The 
largest cluster for the shuffled data contains 196,112 inhabitants: the reshuffling prevents 
the emergence of very large clusters. This suggests that the CCA is not forcing the data to 
present a power-law for the population distribution, and that Zipf 's law arises purely from 
the data. 
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Figure 10: Population distribution for shuffled data. The black lines correspond to the 
real data studied in Section 13.11 The red lines correspond to the shuffled data, showing a 
change in the population distribution and suggesting that the results of Section 13.11 are not 
an artifact of the CCA. 

4 Investigation of the Geography of Cities: Areas and 
Densities 

4.1 Areas 

The CCA presents a unique feature in that it allows the definition of the area of cities not 
based on administrative boundaries. Such a feature in not present in agglomerations defined 
by Places or MSAs. Thus, the spatial analysis of the CCA allows us to examine a possible 
feature of the origin of the Zipf's law: highly populated cities may have a large geographic 
area. Therefore, it is of interest to study the distribution of areas (IMakse. Havlin and Stanley 
1995k P(A), defined by the CCA. 

As explained above, the data of GB consists of a high resolution grid with cell size 200m. 
Therefore, after applying the CCA, we calculate the area of a cluster in GB as the number 
of cells in the cluster multiplied by the area of a cell, £ 2 . 

The case of the USA is more complicated. The data consists of 61,224 populated points 
on the map. Each point corresponds to a different F1PS code, defined by the US Census 
Bureau. USA F1PS are simply a partition of the map of the USA, so that any point in 
the map belongs to one F1PS code, and each F1PS has an associated area which is given 
by the US Census Bureau in the dataset. In the USA, F1PS codes are not homogeneously 
distributed. In the New York City area, there is high resolution, which means that there 
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are many FIPS covering a small area, but in the state of Wyoming or Utah the resolution 
is quite low, so that there are FIPS with a large area. For instance, FIPS in Manhattan 
typically cover an area of about 0.20 km 2 while in the state of Utah FIPS 49003960100 covers 
a large area of 15,962 km 2 . Therefore, when t is of the order of a few kilometers, a FIPS in 
the Wyoming area will remain isolated in its own cluster, but still its area will be extremely 
large, typically a couple of orders of magnitude larger than £ 2 . Therefore, since the area of 
isolated points is very large, these points will appear at the tail and in the middle of the 
distribution P(A), overestimating the outcome for middle and large areas. Accordingly, in 
order to compute the P(A), we do not take into account clusters containing only 1 or 2 
FIPS since they overestimate the amount of land they cover. Moreover, the population of 
those isolated points is typically small and rarely exceeds S = 10, 000. In fact, we find that 
removing all clusters with only 1 or 2 FIPS is practically the same as removing all clusters 
with population smaller than 10,000: only 7% of clusters with 1 or 2 FIPS have a population 
larger than 10,000. 

In Fig. [TTr we report the results of P(A) for the USA. We find a power-law distribution 
of the form 

P(A) ~ A'**- 1 , (4) 

with a Zipf exponent (a = 1-07 ± 0.04, for I = 3 km. In Fig. [TTb we show the results 
of P(A) for GB. As for the USA, we find that the area distribution for GB follows a 
power-law with exponent Ca = 0.97 ± 0.04, for i = 1 km. This extends the results ob- 
tained in (IMakse et al.l Il998l) for areas distributions surrounding a cit y like London and 
Berlin (IMakse. Havlin and Stanley! Il995l ) and in UK (IMakse et al.lll998l ). The result of the 
Zipf 's law for areas in the US appears to be new. 

This result may be an important update for ca librated models of cities where transport 
costs of goods or people play an importa nt role ferakman. Garretsen and van Marrewijk 



2009; 



Fujita. Krugman and Venablesll200ll ). The Zipf's law for areas implies that some cities 



have very large areas, and those cities' viability may mean that transport costs cannot be 
too large, or are mitigated in economically interesting ways. We come back to this topic in 
Section [51 

In Fig. [T2"a we study the correlations between areas and populations for the USA and 
GB. We find that the linear OLS regression InA = a + b InS leads to the results shown in 
Table [TJ, indicating a strong correlation between areas and population in log sizes. Indeed, 
the finding of b ~ 1 indicates that population is, to a good degree of approximation, simply 
proportional to area. This finding motivates us to study city density in more detail. 
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a A [km 2 ] b A [km 2 ] 



Figure 11: a, Probability distribution of the areas, P(A), for the USA for different £. b, 
Probability distribution P(A) of the areas of the clusters in GB at different coarse-graining 
scales t. The distribution of city areas for GB is also consistent with Zipf 's law. We find 
(a = 0.97 ± 0.04, for £ = 1 km. The black solid lines denote Zipf's law, i.e. a power-law 
function with exponent -2. 




Figure 12: Logarithm of the population, S versus the logarithm of the area, A, for a, the 
USA with £ = 3 km and b, GB for £ = 1 km. The black lines denote the OLS regression 
(see Tabled!) 
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Table 1: Results of the OLS regression analysis of lnS" = a + b lm4, where A is the area 
and S the population. We report results for 5* = 12,000 and £ = 3 km for the USA, and 
= 5, 000 and £ = 1 km for GB. Standard errors are reported in parentheses. 





USA 


GB 


InA 


0.958 


1.065 




(0.020) 


(0.007) 


Constant 


6.567 


8.166 




(0.085) 


(0.010) 


Observations 


1064 


1007 


R 2 


0.686 


0.921 



4.2 Densities 

In this section we study the population density, D — S/A. § We study the behavior of D 
versus S and A by performing the linear regressions InZ) = a + b InA, and InD = a + b InS. 
Table [2] shows the results of the OLS regression estimates with S* = 12, 000 and £ = 3 
km for the USA, and S 1 * = 5, 000 and £ — 1 km for GB (other choices of £ lead to the 
same conclusions). We find that population density has very little relation to the area: the 
coefficients are very close to 0. It has a slightly higher link with population. Of course, 
measurement error in the variables may bias the measurement. 

Still, the link between density and area is perhaps surprisingly weak. Some urban systems, 
like New York City, are quite dense, but even then, the effects are moderate: the density 
of New York City is only 3.7 times the national median even though its population is 485 
times the national median. Of course, we obviate here a consideration of the interesting 
heterogeneity within cities; but for the purposes of this paper such a study may be deferred 
to later work. We find that density has a very small dispersion: the standard deviation of 
its natural logarithm is 0.28 for the USA and 0.09 for GB. In contrast, the corresponding 
quantity for areas and population is about 1. Hence, we conclude that city area covaries 
greatly with population, and little with density. We next propose a model that is consistent 
with this finding, as well as the power law scaling of city sizes. 



5 Model 



Recent economic th eories that are compat i ble with Zip 



of random growth ( Champernowne 



195 



3; 



Simon 



1955 



s law generally rely on the existence 



Krugman 



1996 



Levy and Solomon 



1996 



Gabai: 



ixl ll999a : 



Dobkins and Ioannides 



2001 



Davis and Weinstein 



2002 



Gabaix and Ioannides 



See iBrvan. Minton and Sartd (120071) for an alternative analysis of density. They find that density has 
fallen in the US over the past seven decades. 
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Table 2: Results of the OLS regression analysis of lnZ) = a + b \nA and \nD = a + b \nS, 
where D = S/A is the density, A the area, and S the population. We report results for 
= 12, 000 and £ = 3 km for the USA, and S* = 5, 000 and i = 1 km for GB. Standard 
errors are reported in parentheses. 





\nD = a 


+ b InA 




InD = a 


+ b\nS 




USA 


GB 




USA 


GB 


\nA 


-0.042 


0.065 


\nS 


0.284 


0.099 




(0.020) 


(0.007) 




(0.015) 


(0.006) 


Constant 


6.567 


8.166 


Constant 


3.357 


7.299 




(0.086) 


(0.010) 




(0.159) 


(0.057) 


Observations 


1064 


1007 


Observations 


1064 


1007 


R 2 


0.004 


0.007 


R 2 


0.256 


0.050 



20041 ; lEeckhoutll2004! : |Purantonll2006l . 120071 : iRossi-Hansberg and Wrightll2007l : IC6rdoball2008f ) : 
cities follow a proportional growth process where the distribution of the perc entage growth 



rate i s the same for small and large cities. Small cities, however, gr ow faster (IGlaeser et al. 



1992 



Glaeser. Scheinkman and Shleiferlll995l ; iRozenfeld et al.ll2008l ). which prevents the dis 



tribution from becoming degenerate. Some theories obtain Zipf's law only approximately, 
and do not obtain it over the range that we find in the present work. Accordingly, we present 
a parsimonious model that generates an approximate Zipf's law for population and area. 

We first describe the model at a given point in time. Cities are indexed by i G [0,1]. 
City i employs Si workers, and has a competitive sector producing good i, which it produces 
in quantity j/ f = biSi, where 6j is the productivity. The aggregate good is a Dixit-Stiglitz 
aggregator with elasticity of substitution 77 > 1: 



Y 



y 



di 



7?/(r?-l) 



(5) 



There is a potentially unbounded quantity of land, but making usable an area a of land 
necessitates an investment pa, for some unit cost p. This reflects e.g. maintenance cost, 
roads and other infrastructure to occupy a land area a (the cost is in the consumption good, 



but it could equivalently be in units of labor). Hence, as in IRossi-Hansberg and Wright 
J2007|), land use is endogenous. As a result, if A is total land use, and C total consumption 



the resource constraint is C + pA < Y. 

Consumers' utility is u (c, a) = c 1_/3 a /3 with < < 1, where c is the consumption of 
the good, and a the consumption of land. Workers are free to choose their cities, so that 
utilities are the same across cities. Hence, the competitive equilibrium is also the solution 
to the planner's problem that equalizes utility across agents, and allocates population Si 
in each city i (subject to f Sidi = S, the total population), and allocates their per capita 
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consumption of good q and land area a,, to maximize total utility subject to the resource 
constraint: 

max / u (q, Gij) Sidi subject to 



(ci + pa t ) Sidi < 
Vi,j,u(ci,ai) = 
Sidi = 



\ v/iv- 1 ) 
{b l S i ) {r >- l)h di\ 



U [Cj, Ctj) 



S 



The solution method is standard, 
population living in city i is: 



The labor allocated to producing good i, i.e., the 



Si = Sis 

B 



(6) 



■q 1 !^ x ) fraction 1 



j3 of income is 



where B t = b^' 1 and B = J B { di. GDP is Y 
devoted to consumption of the good, and a fraction to land use. Each consumer purchases 
a quantity of land v = fiB 1 ^ 11 1 /p. So, the total quantity of land in city i, A^, is v times 
the number of inhabitants of city i: 



As = vSi 



(7) 



Hence city area and city population are proportional. 

Next, we wish to see why Zipf's law might arise. We consider a dynamic version of the 
above static description. Consumers have utility E [J e~ St u(ct, at)di\ with some discount rate 
S, which is assumed to be sufficiently large for utility to be finite. As there are no adjustment 
costs, for given productivities, the dynamic model yields the same allocation of workers and 
land across cities as in the static model. We take a model that merge s the random gr owth 
models of cities and the model of random growth of firms developed by iLuttmerl (120071 ). We 
postulate that (elasticity-adjusted) productivity B { of city i evolves as a geometric Brownian 
motion: 

dB it 



gdt + adzi 



(8) 



where g is the mean of the growth rate of productivity of an existing city, a the volatility 
of that growth rate, and z# are independent Brownian motions. However, if a city is too 
unproductive, it can "refresh" its productivity as a fraction of the average productivity: 



Bit > 7TB t 



(9) 



where ir G (0, 1) is a constant. Here, we simply postulate that it can reset its productivity for 
free, by simply imitating the average productivity, b ut only imperfec tly: its reset productivity 
is only a fraction 7r of the average productivity. ILuttmerl ( 120071 ) presents a much more 
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elaborate microfoundation for this idea, including the ir, but the above model is useful for 
its simplicity. All in all, Bu follows a geometric Brownian motion, reflected at 7iB t . 
The following Proposition characterizes the behavior of this economy. 

Proposition (i) The steady state distribution of city population and city area is a power-law 
with exponent £: 

C = ^- (io) 



1 



7T 



2\x/a 2 and some constant k. 



Indeed, Su/ St, An/ At and Ba/B t are all equal and follow the Pareto distribution P (X > x) — 
[x/n]^^ for x > it. The exponent £ tends to 1 (the Zipf's law value) when the friction n 
coming from the reflecting barrier tends to 0. 

(ii) City population S is proportional to city area A, and density D = S/A is independent 
of city size. 

(Hi) The fraction of income spent on housing is independent of city size. 

Proof The proof method is as in iGabaixl ( 1999a ) (see also Gabaix ( 2009 ) and the references 
therein). Denote by ~g the growth rate B t on the balanced growth path. The relative share 
of city i, su = B it /B t follows a geometric Brownian motion, with ds it /s it = (g — g)dt + adz it , 
with a reflecting barrier, s it > vr. Calling \i = g — g, the steady state density p(s) follows 
the Forward Kolmogorov equation: 

= -^sp( S )y + ^(a 2 s 2 p( S )Y 

Integration of this equation yields p (s) = ks~^~ l for ( = 1 
By construction E [s] = 1. Given p (s) ds = 1, we have 

P ( s ) s ds ks'^sds 
J° c p(s)ds f°° ks'^sds C — 1 

which yields ( = 1/ (1 — ir). The steady state distribution can be written P (s it > x) — 
(x/tt) . Finally, by ((6]) and (I7j), the distribution of populations and areas is a Pareto with 
the same exponent (. 

We also note that ( = 1 — 2(g — g)/cr 2 . This yields the value of the growth rate of 
productivity: g = g + a 2 (( — 1) /2. The (endogenous) growth rate of average productivity 
is higher than the (exogenous) growth rate of a city above the reflecting barrier, because 
this reflecting barrier makes small cities grow faster. In the Zipf limit where 7r — > 0, hence 
£ — > 1, the difference between the two growth rates, g — g, goes to 0. ■ 

This ec onomy reflects our main e mpiri cal findings, (i) and (ii). Point (iii) reflects the 
findings of iDavis and Ortalo-Magnel ( 120081 ). who find that the fraction of income spent on 
housing is roughly constant ov er time and across city sizes. 



7T- 



c 



We note that here, following lRossi-Hansberg and Wrightl (120071 ) and I Van Nieuwerburgh and Weill 



( 120091 ). land is not exogenous but instead it is acquired. This is a legitimate modelling 



21 



idealization in our view. Take a city such as Dallas, which starts with vast quantities of 
unoccupied land around it. It can grow in a fairly unlimited way, but it needs to pay for 
the land use, e.g. building infrastructure such as road, electricity and running water. It 
makes sense to model this activity as a constant-return to scale activity, at least in the first 
approximation. At the other end of the spectrum, we may have New York. But even it has 
grown considerably by geographical expansion, which lends credence to our model. It would 
be interesting, and surely desirable, to extend the model with some sort of increasing cost 
of land use (given some limit). We conjecture that, if the random growth effects are large 
enough, this will modify the power law distribution, but will not eliminate it. A calibration 
of the deviation from the constant return to scale model, and the deviation of the power law, 
would be useful, but we will not attempt it here. 

Here cities are basically constant-return-to-scales economies, except for one large Mar- 
shallian force that makes a given good only producible in one city (as "secrets of the trade" 
may be exclusive that city). Of course, this is a stark model, but it is parsimonious, and 



is consistent with our scaling 
be huge. For instance, 



Glaeser 



: acts. In addition, external effects linked to cities may not 
(ll998f ) reports quantitativel y moderate deviati o ns fro m the 
hypothesi s that cities ar e constant-return-to scale (see also iBettencourt et al.l (120071 )). For 
instance, iGlaeserl (119981 ) reports that the average commute time in cities of less than 100,000 



is 20.5 minutes each way, while in cities of more than 1,000,000 it is 31.9 minutes each wajj§. 
This difference may be small compared to the huge differences in size and area that our 
model focuses on. 

Our model postulates that Gibrat's law holds. However, deviations from Gibrat's law 



have b een fo und in the literature an d for the CCA clusters (e.g. iGlaeser. Scheinkman and Shleifer 



dl995k and iRozenfeld et al.l (120081 ) ). A simple theoretical solution to this apparent tension 
between the data and the idealization used in models based on Gibrat's law is discussed in 



Gabaix and Ioannidesl (120041 ). Section 3.2.2. Urban growth may accommodate a wide range 



of growth processes exhibiting a Pareto distribution, but also deviations from Gibrat's law, 
as long as they contain a unit root (which satisfies Gibrat) with respect to the logarithm 
of city size: in particular, growth processes can have some mean-reverting component that 
violates Gibrat's law. Under that hypothesis, the deviations from Gibrat's law would come 
from the mean-reverting component of the growth process, but Gibrat's law in the unit root 
part of the process would ensure the Pareto law. More research is needed to empirically 
assess this possibility. It would likely require empirical studies of Gibrat's law over long time 
intervals. 

We think that the model could be extended to add positive and negative agglomera- 



2 In a rel ated vein. ICiccone and Hall (Il996l) estimate that a doubling of density increases productivity by 
5.5%, while iDavis. Fisher and Whitedl (|2009t ) finds an increase of 2%. This is a arguably small deviation 
from the constant-return to scale benchmark we use in our model. 
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tion e xternali ties, which also can generate rando m growth, as in iGabaix! (jl999bl ). lEeckhout 
(120041 ). and iRossi-Hansberg and Wrightl (120071 ). We also eschew a detailed mode lling of 



the heterogeneity within a city, such as the one in Lucas and Rossi-Hansbergl ( 2002 ). Such 
developments would be very welcome, but we propose to defer them to future research. 



6 Conclusion 

We have used a "bottom-up" approach which allows us to construct cities independently of 
their "legal" definition, instead using a more geographical and economic basis. The resulting 
data extend the domain of validity of Zipf 's law to a considerable range: we show that when 
cities are constructed independently of their administrative boundaries, Zipf 's law appears 
to be a genuine regularity for the bulk of the city size distribution. Second, we are able to 
analyze city areas, which allows for the estimation of a potentially very important quantity 
in urban economics, and anchors the definition of cities much more in geography. We find 
evidence for a power-law distribution of areas, with an exponent close to 1. Third, we 
presented a model incorporating both population and area, that matches our "macro" facts. 
Fourth, we provide a public good by putting on our web page the correspondence between 
ZIP code and our Clusters, so that other researchers can use the agglomerations constructed 
with the CCA, and study dimensions of local economics other than areas and populations. 

In the present work we have investigated only two countries. It is natural to extend this 
study to more countries, an investigation that might offer confirmation of the scaling laws 
for areas, population and density that we have found, and also perhaps find economically 
interesting deviations from them. The minimalist model presented here could be extended to 
incorporate richer specification of the internal structure of cities. We think that this "bottom- 
up" approach could be useful for a host of urban questions. Combining our geographical 
approach with land price data could lead to a much more constrained and geography-based 
theory of the macro and internal structure of cities. 
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